SQL Server 2008 : Working with Indexes

10/17/2010 5:44:25 PM

Creating Indexes

An index is a lookup structure created on a table to optimize, sort, and query performance. Indexes are created on a particular column or columns and store the data values for this column or columns in order. When raw underlying table data is stored in no particular order, this situation is referred to as a heap. The heap is composed of multiple pages, with each page containing multiple table rows. When raw underlying data is stored in order, sorted by a column or columns, this situation is referred to as a clustered index. For example, if you have a table named Customer, with a clustered index on the FullName column, the rows in this table will be stored in order, sorted by the full name. This means that when you are searching for a particular full name, the query optimizer component can execute the query more efficiently by performing an index lookup rather than a table scan. Only one clustered index is allowed per table; usually this is created on the column designated as the PRIMARY KEY.

You can also create additional nonclustered indexes on a table that is stored either as a heap or as a clustered index. A nonclustered index is a separate lookup structure that stores index values in order, and with each index value, it stores a pointer to the data page containing the row with this index value. Nonclustered indexes speed up data retrieval. It makes sense to create nonclustered indexes on all frequently searched fields in a table. The trade-off with indexes is write performance. Every time a new row is inserted, the index must also be updated. When writing data to a table with nonclustered indexes, sometimes the pages within the table have to be rearranged to make room for the new values. In addition, indexes are storage structures that take up disk space. Indexes are created using the CREATE INDEX statement. Example 1 shows the syntax for creating an index.

Example 1. CREATE INDEX Statement—Syntax

CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name
ON table_or_view ( column1 [ ASC | DESC ], column2, ...n)
[ INCLUDE (additional_column_name, ...n) ]
[ WHERE filter_clause]
[ WITH OPTIONS]

The CREATE INDEX statement creates a clustered or nonclustered index on a specified column or columns. You can choose to create the index as UNIQUE, which will enforce a UNIQUE constraint on the index columns. A filter_clause can be specified to create indexes only on a subset of data that meets specific criteria. This is useful for a very large table, where creating an index on all values of a particular column will be impractical. Table 1 summarizes index options that can be used with the CREATE INDEX statement.

Table 1. Index Options
Option	Explanation
PAD_INDEX = ON \| OFF	When this option is ON, free space is allocated in each page of the index. Allows for new values to be inserted without rearranging a large amount of data. The amount of free space allocated is specified by the FILLFACTOR parameter. When this option is OFF, enough free space for one row is reserved in every page during index creation.
FILLFACTOR = fill factor percentage	Specifies the percentage of each page percentage. that should be filled up with data. For example, a fill factor of 80 means 20% of each page will be empty and available for new data. The fill factor is used only when you create or rebuild an index. Fill factor and index padding are discussed in detail in
SORT_IN_TEMPDB = ON \| OFF	Specifies whether the data should be sorted in the tempdb database instead of the current database. This may give performance advantages if the tempdb database is stored on a different disk to the current database.
IGNORE_DUP_KEY = ON \| OFF	Specifies that duplication errors should be ignored when creating unique indexes.
STATISTICS_NORECOMPUTE =ON \| OFF	Specifies that optimization statistics should not be updated at this time.
DROP_EXISTING = ON \| OFF	Specifies that the existing index with the same name should be dropped and then be re-created. This equates to an index rebuild.
ONLINE = ON \| OFF	Specifies that the underlying table should remain online and accessible by users while the index is being built. This option is only available in SQL Server 2008 Enterprise or Developer edition.
ALLOW_ROW_LOCKS = ON \| OFF	Specifies whether locks should be held on each row, as necessary.
ALLOW_PAGE_LOCKS = ON \| OFF	Specifies whether locks should be held on each page, as necessary.
MAXDOP = max_degree_of_parallelism	Specifies the maximum number of processors that are to be used during the rebuild operation.
DATA_COMPRESSION = NONE \| ROW \| PAGE	Use data compression at row or page level of the index.

Example 2 creates a clustered index (by star name) and a nonclustered index (by star type) on the Stars table we created in the previous example. Figure 1. 3IX_Star_Name can be created using the interface of SQL Server Management Studio.

Example 2. Working with Indexes

--Create the table specifying that the PRIMARY KEY index is to be created
as nonclustered
CREATE TABLE Stars
(StarID int PRIMARY KEY NONCLUSTERED,
StarName varchar(50) Unique,
SolarMass decimal(10,2) CHECK(SolarMass > 0),
StarType varchar(50) DEFAULT 'Orange Giant');
GO
CREATE CLUSTERED INDEX Ix_Star_Name
ON Stars(StarName)
WITH (PAD_INDEX = ON,
FILLFACTOR = 70,
ONLINE = ON);
GO
CREATE NONCLUSTERED INDEX Ix_Star_Type
ON Stars (StarType)
WITH (PAD_INDEX = ON,
FILLFACTOR = 90);
GO

Figure 1. Creating an Index Using SQL Server Management Studio

When you are creating a PRIMARY KEY constraint, an index on the column(s) designated as PRIMARY KEY will be created automatically. This index will be clustered by default, but this can be overridden when creating the index by specifying the PRIMARY KEY NONCLUSTERED option. As a best practice, it is recommended that you accept the default of the clustered PRIMARY KEY column, unless you have a specific reason to designate another column as the clustered index key. Usually, the automatically created index is named PK_TableName_<Unique Number>, but this can be changed at any time by renaming the index. For example, a newly created Stars table with a PRIMARY KEY of StarID automatically has an index named UQ__Stars__A4B8A52A5CC1BC92.

Warning

Remember that when creating a table, a unique index will be automatically created on the columns designated as the PRIMARY KEY. If you wish to avoid the long rebuild time associated with building a clustered index, or if you wish to create the clustered index on a column different from the PRIMARY KEY, you must explicitly specify the PRIMARY KEY NONCLUSTERED option. The PRIMARY KEY will always be unique.

Working with Full–Text Indexes

Standard indexes are great when used with the simple WHERE clause of the SELECT statement. An index will greatly reduce the time it will take you to locate rows where the indexed column is equal to a certain value, or when this column starts with a certain value. However, standard indexes are inadequate for fulfilling more complex text-based queries. For example, creating an index on StarType will not help you find all rows where the StarType column contains the word “giant,” but not the word “supermassive”.

To fulfill these types of queries, you must use full-text indexes. Full-text indexes are complex structures that consolidate the words used in a column and their relative weight and position, and link these words with the database page containing the actual data. Full-text indexes are built using a dedicated component of SQL Server 2008—the Full-Text Engine. In SQL Server 2005 and earlier, the Full-Text Engine was its own service, known as full-text search. In SQL Server 2008, the Full-Text Engine is part of the database engine (running as the SQL Server Service).

Full-text indexes can be stored on a separate filegroup. This can deliver performance improvements, if this filegroup is hosted on a separate disk from the rest of the database. Only one full-text index can be created on a table, and it can only be created on a single, unique column that does not allow null values. Full-text indexes must be based on columns of type char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, and varbinary(max). You must specify a type column, when creating a full-text index on a image, varbinary, or varbinary(max) columns. The type column stores the file extension (.docx, .pdf, .xlsx) of the document stored in the indexed column.

Example 3 amends the Stars table to include a Description column and creates a full-text index on this column. The FREETEXT function allows us to search on any of the words specified using the full-text index. This yields a similar user experience as using an Internet search engine.

Example 3 Creating and Using a Full-Text Index

ALTER TABLE Stars
ADD Description ntext DEFAULT 'No description specified' NOT NULL ;
GO
CREATE FULLTEXT CATALOG FullTextCatalog AS DEFAULT;
CREATE FULLTEXT INDEX ON Stars (Description)
KEY INDEX PK__Stars__06ABC6465F9E293D;
GO
UPDATE Stars SET Description = 'Deneb is the brightest star in the
constellation Cygnus and one of the vertices of the Summer Triangle. It is
the 19th brightest star in the night sky, with an apparent magnitude of 1.25.
A white supergiant, Deneb is also one of the most luminous stars known. It
is, or has been, known by a number of other traditional names, including
Arided and Aridif, but today these are almost entirely forgotten. Courtesy
Wikipedia.'
WHERE StarName = 'Deneb';
UPDATE Stars SET Description = 'Pollux, also cataloged as Beta Geminorum,
is an orange giant star approximately 34 light-years away in the constellation
of Gemini (the Twins). Pollux is the brightest star in the constellation
(brighter than Castor (Alpha Geminorum). As of 2006, Pollux was confirmed to
have an extrasolar planet orbiting it. Courtesy Wikipedia.'
WHERE StarName = 'Pollux';
GO
SELECT StarName
FROM Stars
WHERE FREETEXT (Description, 'planet orbit, giant');
GO
-- Results:
-- StarName
-- --------------------------------------------------
-- Pollux

Partitioning Data

When working with large databases, query performance often becomes an issue, even if your indexing strategy is spot-on. If you have decided that indexing is not enough to produce your desired result, your next step can be data partitioning. Data partitioning separates a database into multiple filegroups containing one or more files. These filegroups are placed on different disks, enabling parallel read and write operations, thus significantly improving performance. Approach a partitioning strategy by separating different tables and indexes into different filegroups and placing them on separate disks. As a guide, always separate large, frequently accessed tables that are in a FOREIGN KEY relationship, so that they can be scanned in parallel when performing a join.

If the desired performance is not achieved by simple partitioning, this is usually due to very large single tables. You can employ a horizontal or vertical partitioning technique to split a single large table into multiple smaller tables. Queries that access this table will run quicker, and performance of maintenance tasks, such as backup and index rebuild, will also be improved.

Horizontal Partitioning

Horizontal partitioning splits a table into several smaller tables by separating out clusters of rows, based on a partitioning function. The structure of the smaller tables will remain the same as the structure of the initial table, but the smaller tables will contain fewer rows. For example, if you have a very large table that has 100 million rows, you can partition it into 10 tables containing 10 million rows each. Date columns are often a good choice for horizontal partitioning. For example, a table could be partitioned historically by year—each year stored in a smaller table. Thus, if a query requires data for specific dates, only one smaller table needs to be scanned.

Analyze the data and how your users are accessing this data in order to derive the best horizontal partitioning strategy. Aim to partition the tables so that the majority of the queries can be satisfied from as few smaller tables as possible. To join smaller tables together, UNION queries are required, and these can degrade performance.

Vertical Partitioning

Unlike horizontal partitioning, vertical partitioning separates different columns of a single table into multiple tables. The resultant smaller tables have the same number of rows as the initial table, but the structure is different. Two types of vertical partitioning are available:

Normalization Normalization is the process of applying logical database design techniques to reduce data duplication. This is achieved mainly by identifying logical relationships within your data and implementing multiple tables related by FOREIGN KEY constraints.
Row splitting This technique separates some columns from a larger table into another table or tables. Essentially, each logical row in a table partitioned using row splitting is stored across two tables. To maintain integrity between the tables, use a FOREIGN KEY constraint when both the primary and FOREIGN KEY participants are unique. This is known as a one-to-one relationship.

If implemented correctly, vertical partitioning reduces the time it takes to scan data. Use row splitting to separate frequently used and rarely accessed columns into separate tables, and eliminate overhead. The drawback of vertical partitioning is the processing time and resources it takes to perform the joins, when needed.

Other -----------------

- SQL Server 2008 : Working with Constraints

- SQL Server 2008 : Working with Tables and Views

- SQL Server 2008 : Viewing and Modifying Data (part 3) - Creating Functions and Creating Triggers

- SQL Server 2008 : Viewing and Modifying Data (part 2) - Creating Stored Procedures

- SQL Server 2008 : Viewing and Modifying Data (part 1) - Creating Views